Catalogue Search | MBRL

Pattern recognition and data analysis with applications

by Gupta, Deepak, editor in Pattern recognition systems. , Reconnaissance des formes (Informatique) , Pattern recognition systems

Book

Share this book

Add to My Shelf

Dense Trajectories and Motion Boundary Descriptors for Action Recognition

by Liu, Cheng-Lin , Wang, Heng , Schmid, Cordelia in Algorithms , Analysis , Applied sciences

2013

This paper introduces a video representation based on dense trajectories and motion boundary descriptors. Trajectories capture the local motion information of the video. A dense representation guarantees a good coverage of foreground motion as well as of the surrounding context. A state-of-the-art optical flow algorithm enables a robust and efficient extraction of dense trajectories. As descriptors we extract features aligned with the trajectories to characterize shape (point coordinates), appearance (histograms of oriented gradients) and motion (histograms of optical flow). Additionally, we introduce a descriptor based on motion boundary histograms (MBH) which rely on differential optical flow. The MBH descriptor shows to consistently outperform other state-of-the-art descriptors, in particular on real-world videos that contain a significant amount of camera motion. We evaluate our video representation in the context of action classification on nine datasets, namely KTH, YouTube, Hollywood2, UCF sports, IXMAS, UIUC, Olympic Sports, UCF50 and HMDB51. On all datasets our approach outperforms current state-of-the-art results.

Journal Article

Share this book

Add to My Shelf

Computational texture and patterns : from textons to deep learning

by Dana, Kristin J., 1968- author in Pattern recognition systems. , Texture mapping.

Book

Share this book

Add to My Shelf

Deep Learning for Generic Object Detection: A Survey

by Liu, Li , Ouyang, Wanli , Wang, Xiaogang in Artificial Intelligence , Computational linguistics , Computer Imaging

2020

Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

Journal Article

Share this book

Add to My Shelf

Audio-visual person tracking : a practical approach

by Talantzis, Fotios , Pnevmatikakis, Aristodemos , Constantinides, A. G in Computer vision. , Pattern recognition systems.

Book

Share this book

Add to My Shelf

FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking

by Zeng, Wenjun , Wang, Xinggang , Zhang, Yifu in Accuracy , Artificial Intelligence , Computer Imaging

2021

Multi-object tracking (MOT) is an important problem in computer vision which has a wide range of applications. Formulating MOT as multi-task learning of object detection and re-ID in a single network is appealing since it allows joint optimization of the two tasks and enjoys high computation efficiency. However, we find that the two tasks tend to compete with each other which need to be carefully addressed. In particular, previous works usually treat re-ID as a secondary task whose accuracy is heavily affected by the primary detection task. As a result, the network is biased to the primary detection task which is not fair to the re-ID task. To solve the problem, we present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet. Note that it is not a naive combination of CenterNet and re-ID. Instead, we present a bunch of detailed designs which are critical to achieve good tracking results by thorough empirical studies. The resulting approach achieves high accuracy for both detection and tracking. The approach outperforms the state-of-the-art methods by a large margin on several public datasets. The source code and pre-trained models are released at https://github.com/ifzhang/FairMOT .

Journal Article

Share this book

Add to My Shelf

Background modeling and foreground detection for video surveillance

by Bouwmans, Thierry, editor , Porikli, Fatih, editor , Hèoferlin, Benjamin, editor in Video surveillance. , Optical pattern recognition.

Background modeling and foreground detection are important steps in video processing used to detect robustly moving objects in challenging environments. This requires effective methods for dealing with dynamic backgrounds and illumination changes as well as algorithms that must meet real-time and low memory requirements.Incorporating both established and new ideas, Background Modeling and Foreground Detection for Video Surveillance provides a complete overview of the concepts, algorithms, and applications related to background modeling and foreground detection.

Book

Share this book

Add to My Shelf

CornerNet: Detecting Objects as Paired Keypoints

by Law, Hei , Deng, Jia in Anchors , Artificial Intelligence , Artificial neural networks

2020

We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.

Journal Article

Share this book

Add to My Shelf

Ensemble learning : pattern classification using ensemble methods

by Rokach, Lior, author in Pattern recognition systems. , Algorithms. , Machine learning.

Book

Share this book

Add to My Shelf

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

by Vedantam, Ramakrishna , Cogswell, Michael , Parikh, Devi in Artificial Intelligence , Artificial neural networks , Computer Imaging

2020

We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach—Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad-CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers ( e.g. VGG), (2) CNNs used for structured outputs ( e.g. captioning), (3) CNNs used in tasks with multi-modal inputs ( e.g. visual question answering) or reinforcement learning, all without architectural changes or re-training . We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are robust to adversarial perturbations, (d) are more faithful to the underlying model, and (e) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show that even non-attention based models learn to localize discriminative regions of input image. We devise a way to identify important neurons through Grad-CAM and combine it with neuron names (Bau et al. in Computer vision and pattern recognition, 2017) to provide textual explanations for model decisions. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a ‘stronger’ deep network from a ‘weaker’ one even when both make identical predictions. Our code is available at https://github.com/ramprs/grad-cam/ , along with a demo on CloudCV (Agrawal et al., in: Mobile cloud visual media computing, pp 265–290. Springer, 2015) ( http://gradcam.cloudcv.org ) and a video at http://youtu.be/COjUB9Izk6E .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter